Beyond Term Indexing: A P2P Framework for Web Information Retrieval
نویسندگان
چکیده
Web search over peer-to-peer (P2P) networks shows promise to become an alternative to the state-of-the-art search engines since P2P overlays offer means for decentralized search across widely-distributed document collections. However, the design of effective techniques for P2P indexing and retrieval raises a number of technical challenges due to potentially unscalable resource (e.g. bandwidth, storage) consumption. The paper presents a framework for full-text information retrieval in structured P2P networks and introduces a novel retrieval model based on highly discriminative keys—terms and term sets appearing in a restricted number of documents—that ensure efficient and scalable retrieval. Our goal is to design scalable techniques for building a global key index in structured P2P overlays for large document collections. We present experimental results that show acceptable indexing and retrieval costs while the retrieval quality is comparable to standard centralized solutions with BM25 relevance computation scheme.
منابع مشابه
Using Highly Discriminative Keys for Indexing in a Peer-to-Peer Full-Text Retrieval System
Excessive network bandwidth consumption, caused by the transmission of long posting lists, was identified as one of the major bottlenecks for implementing distributed full-text retrieval in a Peer-toPeer (P2P) architecture. To address this problem we introduce a novel approach to indexing using highly discriminative terms and term sets, which leads to short posting lists and therefore reduces t...
متن کاملBuilding a peer-to-peer full-text Web search engine with highly discriminative keys
Web search engines designed on top of peer-to-peer (P2P) overlay networks show promise to enable attractive search scenarios operating at a large scale. However the design of effective indexing techniques for extremely large document collections still raises a number of open technical challenges. Resource sharing, self-organization, and low maintenance costs are favorable properties of P2P over...
متن کاملBridging the P2P and WWW Divide with DISCOVIR - DIStributed COntent-based Visual Information Retrieval
In the light of image retrieval evolving from text annotation to content-based and from standalone applications to web-based search engines, we foresee the need for deploying content-based image retrieval (CBIR) into Peer-to-Peer (P2P) architecture. By doing so, we not only distribute the tasks of feature extraction, indexing and storage of image data into peers, we also introduce another aspec...
متن کاملText Based Approaches for Content Based Image Retrieval in a P2P Network
The tremendous growth of digital multimedia content on the web requires scalable, efficient, and effective information retrieval mechanisms. Handling such large collections of data in a centralized way requires costly high bandwidth connectivity and powerful servers. This establishes the need of distributed architectures, such as peer-to-peer systems, that allow sharing of data management and s...
متن کاملA Scalable Semantic Indexing Framework for Peer-to-Peer Information Retrieval
The exponential growth of data demands scalable and adaptable infrastructures for indexing and searching a huge amount of data sources with high accuracy and efficiency. Existing centralized search engines are not scalable and suffer from single-point-offailures. The recent work on P2P index construction partitions the document vectors either randomly or statically, making it difficult to trade...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Informatica (Slovenia)
دوره 30 شماره
صفحات -
تاریخ انتشار 2006